Representation learning applications in biological sequence analysis

نویسندگان

چکیده

Although remarkable advances have been reported in high-throughput sequencing, the ability to aptly analyze a substantial amount of rapidly generated biological (DNA/RNA/protein) sequencing data remains critical hurdle. To tackle this issue, application natural language processing (NLP) sequence analysis has received increased attention. In method, sequences are regarded as sentences while single nucleic acids/amino acids or k-mers these represent words. Embedding is an essential step NLP, which performs conversion words into vectors. Specifically, representation learning approach used for transformation process, can be applied sequences. Vectorized then function and structure estimation, input other probabilistic models. Considering importance growing trend research, present study, we reviewed existing knowledge analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information theory applications for biological sequence analysis

Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-...

متن کامل

Hidden Markov Models and their Applications in Biological Sequence Analysis

Hidden Markov models (HMMs) have been extensively used in biological sequence analysis. In this paper, we give a tutorial review of HMMs and their applications in a variety of problems in molecular biology. We especially focus on three types of HMMs: the profile-HMMs, pair-HMMs, and context-sensitive HMMs. We show how these HMMs can be used to solve various sequence analysis problems, such as p...

متن کامل

Biological sequence analysis

This talk will review a little over a decade’s research on applying certain stochastic models to biological sequence analysis. The models themselves have a longer history, going back over 30 years, although many novel variants have arisen since that time. The function of the models in biological sequence analysis is to summarize the information concerning what is known as a motif or a domain in...

متن کامل

Biological Sequence Analysis

Background The schematic for every living organism is stored in long molecules known as chromosomes made of a substance known as DNA (deoxyribonucleic acid. Each cell in an organism has a complete copy of its DNA, also known as its genomewhich is conveniently modeled as a sequence of symbols (alternately referred to as nucleotides or bases) in the DNA alphabet {A,C,T,G}. In humans, and most mam...

متن کامل

Sequence Complexity for Biological Sequence Analysis

A new statistical model for DNA considers a sequence to be a mixture of regions with little structure and regions that are approximate repeats of other subsequences, i.e. instances of repeats do not need to match each other exactly. Both forward- and reverse-complementary repeats are allowed. The model has a small number of parameters which are fitted to the data. In general there are many expl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computational and structural biotechnology journal

سال: 2021

ISSN: ['2001-0370']

DOI: https://doi.org/10.1016/j.csbj.2021.05.039